covariance sigma
Reviews: Neural Tangent Kernel: Convergence and Generalization in Neural Networks
The authors prove that networks of infinite width trained with SGD and (infinitely) small step size evolve according to a differential equation, the solution of which depends only on the covariance kernel of the data and, in the case of L2 regression, on the eigenspectrum of the Kernel. I believe this is a breakthrough result in the field of neural network theory. It elevates the analysis of infinitely wide networks from the study of the static initial function to closely predicting the entire training path. There are a plethora of powerful consequences about infinitely wide, fully-connected networks: - They cannot learn information not contained in the covariance matrix - Change to latent representation and parameters tends to zero as width goes to infinity. Therefore choosing nonlinearities in all layers reduces to choosing a single 1d function.